Back-Translation for Discovering Distant Protein Homologies

نویسندگان

  • Marta Gîrdea
  • Laurent Noé
  • Gregory Kucherov
چکیده

Frameshift mutations in protein-coding DNA sequences produce a drastic change in the resulting protein sequence, which prevents classic protein alignment methods from revealing the proteins’ common origin. Moreover, when a large number of substitutions are additionally involved in the divergence, the homology detection becomes difficult even at the DNA level. To cope with this situation, we propose a novel method to infer distant homology relations of two proteins, that accounts for frameshift and point mutations that may have affected the coding sequences. We design a dynamic programming alignment algorithm over memory-efficient graph representations of the complete set of putative DNA sequences of each protein, with the goal of determining the two putative DNA sequences which have the best scoring alignment under a powerful scoring system designed to reflect the most probable evolutionary process. This allows us to uncover evolutionary information that is not captured by traditional alignment methods, which is confirmed by biologically significant examples.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Primary structure of human C-reactive protein.

The complete amino acid sequence of human C-reactive protein has been established. Distant homologies to C3 homology region in the CH2 domain of IgG and to C3a anaphylotoxin have been noted. No homology to other immunoglobulin homology regions or to the same homology region in other heavy chains was observed. The previously reported homologies between rabbit and human C-reactive protein and pro...

متن کامل

The Expression of FOXE-1 and STIP-1 in Papillary Thyroid Carcinoma and Their Relationship with Patient Prognosis

 Background and Objective: Most patients with papillary carcinoma of the thyroid gland (PTC) have favorable outcome, but since it has severe capability to invade the nearby tissues, there is a great risk of regional and distal lymph-nodes (LNs) metastases related to poor prognostic parameters, early recurrences, and distant metastasis that lead to bad patient outcome. ...

متن کامل

Optimization of a new score function for the detection of remote homologs.

The growth in protein sequence data has placed a premium on ways to infer structure and function of the newly sequenced proteins. One of the most effective ways is to identify a homologous relationship with a protein about which more is known. While close evolutionary relationships can be confidently determined with standard methods, the difficulty increases as the relationships become more dis...

متن کامل

Features Extraction For Protein Homology Detection Using Hidden Markov Models Combining Scores

Few years back, Jaakkola and Haussler published a method of combining generative and discriminative approaches for detecting protein homologies. The method was a variant of support vector machines using a new kernel function called Fisher Kernel. They begin by training a generative hidden Markov model for a protein family. Then, using the model, they derive a vector of features called Fisher sc...

متن کامل

Discovering Domains Mediating Protein Interactions

Background: Protein-protein interactions do not provide any direct information re‌garding the domains within the proteins that mediate the interactions. The majority of proteins are multi domain proteins and the interaction between them is often defined by the pairs of their domains. Most of the former studies focus only on interacting do‌main pairs. However they do not consider the in...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009